Statistical Applications in Genetics and Molecular Biology

نویسندگان

  • Stephen O. Nyangoma
  • Antoine A. H. C. van Kampen
  • Theo H. Reijmers
  • Natalia I. Govorukhina
  • Ate G. J. van der Zee
  • Lucinda J. Billingham
  • Rainer Bischoff
  • Ritsert C. Jansen
چکیده

Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for sensitive detection and quantification of proteins and peptides in complex biological fluids like serum. LC-MS produces complex data sets, consisting of some hundreds of millions of data points per sample at a resolution of 0.1 amu in the m/z domain and 7000 data points in the time domain. However, the detection of the lower abundance proteins from this data is hampered by the presence of artefacts, such as high frequency noise and spikes. Moreover, not all of the tens of thousands of the chromatograms produced per sample are relevant for the pursuit of the biomarkers. Thus in analysing the LC-MS data, two critical pre-processing issues arise. Which of the thousands of the: 1. chromatograms per sample are relevant for the detection of the biomarkers?, and 2. signals per chromatogram are truly compound-related? Each of these issues involves assessing the significance (deviation from noise) of multiple observations and the issue of multiple comparisons arises. Current methods disregard the multiplicity and provide no concrete threshold for significance. However, with such procedures, the probability of one or more false-positives is high as the number of tests to be performed is large, and must be controlled. Realizing that the cut-offs for declaring a chromatogram (or a signal) to be compound-related can hugely influence which proteins are detected, it seems natural to define thresholds that are neither arbitrary nor subjective. We suggest the choice of thresholds guided by the critical aim of controlling the False Discovery Rate (FDR) in multiple hypotheses testing for significance over a large set of features produced per sample. This involves the use of the regression diagnostics to characterize the signals of a chromatogram (e.g. as outliers or influential) and to suggest suitable tests statistics for the multiple testing procedures (MTP) for discriminating noise and spikes from true signals. The role of ∗This work was funded by the BioRange program of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the NGI. We gratefully acknowledge support from two Groningen Bioinformatics Centre funds, the Bolgewasen project (grant number SENTER-TSGE3043) and the Tropisch Fruit project (grant number SENTER-TSOM3005), and also the funding by MRC UK (grant No. RRAK11686). RB was funded by the Dutch Cancer Fund (KWF grant number RUG 2004-3165). the Generalized Linear Models (GLM) in this MTP is investigated. The method is applied to LCMS datasets from trypsin-digested serum spiked with varying levels of horse heart cytochrome C (cytoc).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

SLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics

"SLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics" is an "Editorial Article" and hasn't abstract.

متن کامل

Statistical Applications in Genetics and Molecular Biology

This note is a comment on the article “Dimension Reduction for Classification with Gene Expression Microarray Data” that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et al., 2006).

متن کامل

Expression Analysis of PKS13, FG08079.1 and PKS10 Genes in Fusarium graminearum and Fusarium culmorum

Background: Identification and quantification of mycotoxins produced by Fusarium species are important in controlling fungal diseases. Objectives: Potential of zearalenone, butenolide and fusarin C production was investigated in five Fusarium graminearum and five F. culmorum isolates at molecular level. Materials and Methods: Presence of PKS13, FG08079.1 and PKS10 genes, associated with produ...

متن کامل

Molecular Epidemiology of Breast Cancer among Iranian-Azeri Population based on P53 Research

Background: This study was done in order to enhance our understanding about molecular and epidemiological features of breast cancer among the Azeri population with special emphasis on the detection of TP53 mutations. We also analyzed the role of the P53codon72 polymorphism (rs1042522) and its role in susceptibility to breast cancer. Methods: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008